A Phrase-Based, Joint Probability Model for Statistical Machine Translation

نویسندگان

  • Daniel Marcu
  • Daniel Wong
چکیده

We present a joint probability model for statistical machine translation, which automatically learns word and phrase equivalents from bilingual corpora. Translations produced with parameters estimated using the joint model are more accurate than translations produced using IBM Model 4. 1 Motivation Most of the noisy-channel-based models used in statistical machine translation (MT) (Brown et al., 1993) are conditional probability models. In the noisy-channel framework, each source sentence e in a parallel corpus is assumed to “generate” a target sentence f by means of a stochastic process, whose parameters are estimated using traditional EM techniques (Dempster et al., 1977). The generative model explains how source words are mapped into target words and how target words are re-ordered to yield well-formed target sentences. A variety of methods are used to account for the re-ordering stage: word-based (Brown et al., 1993), templatebased (Och et al., 1999), and syntax-based (Yamada and Knight, 2001), to name just a few. Although these models use different generative processes to explain how translated words are re-ordered in a target language, at the lexical level they are quite similar; all these models assume that source words are individually translated into target words.1 The individual words may contain a non-existent element, called NULL. We suspect that MT researchers have so far chosen to automatically learn translation lexicons defined only over words for primarily pragmatic reasons. Large scale bilingual corpora with vocabularies in the range of hundreds of thousands yield very large translation lexicons. Tuning the probabilities associated with these large lexicons is a difficult enough task to deter one from trying to scale up to learning phrase-based lexicons. Unfortunately, trading space requirements and efficiency for explanatory power often yields non-intuitive results. Consider, for example, the parallel corpus of three sentence pairs shown in Figure 1. Intuitively, if we allow any Source words to be aligned to any Target words, the best alignment that we can come up with is the one in Figure 1.c. Sentence pair (S2, T2) offers strong evidence that “b c” in language S means the same thing as “x” in language T. On the basis of this evidence, we expect the system to also learn from sentence pair (S1, T1) that “a” in language S means the same thing as “y” in language T. Unfortunately, if one works with translation models that do not allow Target words to be aligned to more than one Source word — as it is the case in the IBM models (Brown et al., 1993) — it is impossible to learn that the phrase “b c” in language S means the same thing as word “x” in language T. The IBM Model 4 (Brown et al., 1993), for example, converges to the word alignments shown in Figure 1.b and learns the translation probabilities shown in Figure 1.a.2 Since in the IBM model one cannot link a Target word to more than a Source word, the training procedure To train the IBM-4 model, we used Giza (Al-Onaizan et al., 1999). IBM−4 T−Table p(y | a) = 1 p(x | c) = 1 p(z | b) = 0.98 p(x | b) = 0.02 Joint T−Table p(x, b c) = 0.34 p(y, a) = 0.01 p(x y, a b c) = 0.32 p(z, b) = 0.33 Corresponding Conditional Table p(x y | a b c ) = 1 p(x | b c) = 1 p(y | a) = 1 p(z | b) = 1 S1: a b c T1: x y S2: b c T2: x S3: b T3: z S1: a b c T1: x y S2: b c T2: x S3: b T3: z S1: a b c T1: x y S2: b c T2: x S3: b T3: z Intuitive Joint IBM−4 a) b) c) e) d) Figure 1: Alignments and probability distributions in IBM Model 4 and our joint phrase-based model. yields unintuitive translation probabilities. (Note that another good word-for-word model is one that assigns high probability to p(x b) and p(z b) and low probability to p(x c).) In this paper, we describe a translation model that assumes that lexical correspondences can be established not only at the word level, but at the phrase level as well. In constrast with many previous approaches (Brown et al., 1993; Och et al., 1999; Yamada and Knight, 2001), our model does not try to capture how Source sentences can be mapped into Target sentences, but rather how Source and Target sentences can be generated simultaneously. In other words, in the style of Melamed (2001), we estimate a joint probability model that can be easily marginalized in order to yield conditional probability models for both source-to-target and target-tosource machine translation applications. The main difference between our work and that of Melamed is that we learn joint probability models of translation equivalence not only between words but also between phrases and we show that these models can be used not only for the extraction of bilingual lexicons but also for the automatic translation of unseen sentences. In the rest of the paper, we first describe our model (Section 2) and explain how it can be implemented/trained (Section 3). We briefly describe a decoding algorithm that works in conjunction with our model (Section 4) and evaluate the performance of a translation system that uses the joint-probability model (Section 5). We end with a discussion of the strengths and weaknesses of our model as compared to other models proposed in the literature. 2 A Phrase-Based Joint Probability Model 2.1 Model 1 In developing our joint probability model, we started out with a very simple generative story. We assume that each sentence pair in our corpus is generated by the following stochastic process: 1. Generate a bag of concepts . 2. For each concept , generate a pair of phrases , according to the distribution , where and each contain at least one word. 3. Order the phrases generated in each language so as to create two linear sequences of phrases; these sequences correspond to the sentence pairs in a bilingual corpus. For simplicity, we initially assume that the bag of concepts and the ordering of the generated phrases are modeled by uniform distributions. We do not assume that is a hidden variable that generates the pair , but rather that . Under these assumptions, it follows that the probability of generating a sentence pair (E, F) using concepts is given by the product of all phrase-tophrase translation probabilities, ! that yield bags of phrases that can be ordered linearly so as to obtain the sentences E and F. For example, the sentence pair “a b c” — “x y” can be generated using two concepts, (“a b” : “y”) and (“c” : “x”); or one concept, (“a b c” : “x y”), because in both cases the phrases in each language can be arranged in a sequence that would yield the original sentence pair. However, the same sentence pair cannot be generated using the concepts (“a b” : “y”) and (“c” : “y”) because the sequence “x y” cannot be recreated from the two phrases “y” and “y”. Similarly, the pair cannot be generated using concepts (“a c” : “x”) and (“b” : “y”) because the sequence “a b c” cannot be created by catenating the phrases “a c” and “b”. We say that a set of concepts can be linearized into a sentence pair (E, F) if E and F can be obtained by permuting the phrases and that characterize all concepts " # . We denote this property using the predicate

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Constraining the Phrase-Based, Joint Probability Statistical Translation Model

The Joint Probability Model proposed by Marcu and Wong (2002) provides a probabilistic framework for modeling phrase-based statistical machine translation (SMT). The model’s usefulness is, however, limited by the computational complexity of estimating parameters at the phrase level. We present a method of constraining the search space of the Joint Probability Model based on statistically and li...

متن کامل

An English-Assamese Machine Translation System

Al-Onaizan,Y. et-al, "Distortion models for statistical machine translation" , In Proceedings of ACL-COLING, July 2006,pp. 529 . . 536. Birch, A. et-al, "Constraining the phrase-based,joint probability statistical translation model", In Proceedings of HLTNAACL Workshop on Statistical Machine Translation, April 2006, pp 154 . . 157. Brown, P. F. et-al, "The mathematics o...

متن کامل

A Lexicalized Reordering Model for Hierarchical Phrase-based Translation

Lexicalized reordering model plays a central role in phrase-based statistical machine translation systems. The reordering model specifies the orientation for each phrase and calculates its probability conditioned on the phrase. In this paper, we describe the necessity and the challenge of introducing such a reordering model for hierarchical phrase-based translation. To deal with the challenge, ...

متن کامل

Phrase-Based Statistical Machine Translation: A Level of Detail Approach

The merit of phrase-based statistical machine translation is often reduced by the complexity to construct it. In this paper, we address some issues in phrase-based statistical machine translation, namely: the size of the phrase translation table, the use of underlying translation model probability and the length of the phrase unit. We present Level-Of-Detail (LOD) approach, an agglomerative app...

متن کامل

NUT-NTT statistical machine translation system for IWSLT 2005

In this paper, we present a novel distortion model for phrase-based statistical machine translation. Unlike the previous phrase distortion models whose role is to simply penalize nonmonotonic alignments[1, 2], the new model assigns the probability of relative position between two source language phrases aligned to the two adjacent target language phrases. The phrase translation probabilities an...

متن کامل

A Phrase-based Unigram Model for Statistical Machine Translation

In this paper, we describe a phrase-based unigram model for statistical machine translation that uses a much simpler set of model parameters than similar phrase-based models. The units of translation are blocks pairs of phrases. During decoding, we use a block unigram model and a word-based trigram language model. During training, the blocks are learned from source interval projections using an...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2002